Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-27570656

RESUMO

Sentence boundary detection (SBD) is a critical preprocessing task for many natural language processing (NLP) applications. However, there has been little work on evaluating how well existing methods for SBD perform in the clinical domain. We evaluate five popular off-the-shelf NLP toolkits on the task of SBD in various kinds of text using a diverse set of corpora, including the GENIA corpus of biomedical abstracts, a corpus of clinical notes used in the 2010 i2b2 shared task, and two general-domain corpora (the British National Corpus and Switchboard). We find that, with the exception of the cTAKES system, the toolkits we evaluate perform noticeably worse on clinical text than on general-domain text. We identify and discuss major classes of errors, and suggest directions for future work to improve SBD methods in the clinical domain. We also make the code used for SBD evaluation in this paper available for download at http://github.com/drgriffis/SBD-Evaluation.

2.
AMIA Annu Symp Proc ; 2016: 1149-1158, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-28269912

RESUMO

Clinical trial coordinators refer to both structured and unstructured sources of data when evaluating a subject for eligibility. While some eligibility criteria can be resolved using structured data, some require manual review of clinical notes. An important step in automating the trial screening process is to be able to identify the right data source for resolving each criterion. In this work, we discuss the creation of an eligibility criteria dataset for clinical trials for patients with two disparate diseases, annotated with the preferred data source for each criterion (i.e., structured or unstructured) by annotators with medical training. The dataset includes 50 heart-failure trials with a total of 766 eligibility criteria and 50 trials for chronic lymphocytic leukemia (CLL) with 677 criteria. Further, we developed machine learning models to predict the preferred data source: kernel methods outperform simpler learning models when used with a combination of lexical, syntactic, semantic, and surface features. Evaluation of these models indicates that the performance is consistent across data from both diagnoses, indicating generalizability of our method. Our findings are an important step towards ongoing efforts for automation of clinical trial screening.


Assuntos
Ensaios Clínicos como Assunto , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Seleção de Pacientes , Definição da Elegibilidade/métodos , Insuficiência Cardíaca , Humanos , Armazenamento e Recuperação da Informação , Leucemia Linfocítica Crônica de Células B , Aprendizado de Máquina
3.
J Biomed Inform ; 58 Suppl: S103-S110, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26375493

RESUMO

The second track of the 2014 i2b2 challenge asked participants to automatically identify risk factors for heart disease among diabetic patients using natural language processing techniques for clinical notes. This paper describes a rule-based system developed using a combination of regular expressions, concepts from the Unified Medical Language System (UMLS), and freely-available resources from the community. With a performance (F1=90.7) that is significantly higher than the median (F1=87.20) and close to the top performing system (F1=92.8), it was the best rule-based system of all the submissions in the challenge. We also used this system to evaluate the utility of different terminologies in the UMLS towards the challenge task. Of the 155 terminologies in the UMLS, 129 (76.78%) have no representation in the corpus. The Consumer Health Vocabulary had very good coverage of relevant concepts and was the most useful terminology for the challenge task. While segmenting notes into sections and lists has a significant impact on the performance, identifying negations and experiencer of the medical event results in negligible gain.


Assuntos
Mineração de Dados/métodos , Complicações do Diabetes/epidemiologia , Registros Eletrônicos de Saúde/organização & administração , Narração , Processamento de Linguagem Natural , Unified Medical Language System/organização & administração , Idoso , Estudos de Coortes , Comorbidade , Segurança Computacional , Confidencialidade , Doença da Artéria Coronariana/diagnóstico , Doença da Artéria Coronariana/epidemiologia , Complicações do Diabetes/diagnóstico , Feminino , Humanos , Incidência , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Ohio/epidemiologia , Reconhecimento Automatizado de Padrão/métodos , Medição de Risco/métodos , Terminologia como Assunto , Vocabulário Controlado
4.
J Biomed Inform ; 58 Suppl: S211-S218, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26376462

RESUMO

Clinical trials are essential for determining whether new interventions are effective. In order to determine the eligibility of patients to enroll into these trials, clinical trial coordinators often perform a manual review of clinical notes in the electronic health record of patients. This is a very time-consuming and exhausting task. Efforts in this process can be expedited if these coordinators are directed toward specific parts of the text that are relevant for eligibility determination. In this study, we describe the creation of a dataset that can be used to evaluate automated methods capable of identifying sentences in a note that are relevant for screening a patient's eligibility in clinical trials. Using this dataset, we also present results for four simple methods in natural language processing that can be used to automate this task. We found that this is a challenging task (maximum F-score=26.25), but it is a promising direction for further research.


Assuntos
Ensaios Clínicos como Assunto/métodos , Mineração de Dados/métodos , Registros Eletrônicos de Saúde/organização & administração , Definição da Elegibilidade/métodos , Processamento de Linguagem Natural , Seleção de Pacientes , Humanos , Reconhecimento Automatizado de Padrão/métodos , Vocabulário Controlado
5.
BMC Med Inform Decis Mak ; 14: 65, 2014 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-25091637

RESUMO

BACKGROUND: Readmissions after hospital discharge are a common occurrence and are costly for both hospitals and patients. Previous attempts to create universal risk prediction models for readmission have not met with success. In this study we leveraged a comprehensive electronic health record to create readmission-risk models that were institution- and patient- specific in an attempt to improve our ability to predict readmission. METHODS: This is a retrospective cohort study performed at a large midwestern tertiary care medical center. All patients with a primary discharge diagnosis of congestive heart failure, acute myocardial infarction or pneumonia over a two-year time period were included in the analysis.The main outcome was 30-day readmission. Demographic, comorbidity, laboratory, and medication data were collected on all patients from a comprehensive information warehouse. Using multivariable analysis with stepwise removal we created three risk disease-specific risk prediction models and a combined model. These models were then validated on separate cohorts. RESULTS: 3572 patients were included in the derivation cohort. Overall there was a 16.2% readmission rate. The acute myocardial infarction and pneumonia readmission-risk models performed well on a random sample validation cohort (AUC range 0.73 to 0.76) but less well on a historical validation cohort (AUC 0.66 for both). The congestive heart failure model performed poorly on both validation cohorts (AUC 0.63 and 0.64). CONCLUSIONS: The readmission-risk models for acute myocardial infarction and pneumonia validated well on a contemporary cohort, but not as well on a historical cohort, suggesting that models such as these need to be continuously trained and adjusted to respond to local trends. The poor performance of the congestive heart failure model may suggest that for chronic disease conditions social and behavioral variables are of greater importance and improved documentation of these variables within the electronic health record should be encouraged.


Assuntos
Registros Eletrônicos de Saúde/estatística & dados numéricos , Cardiopatias/terapia , Readmissão do Paciente/estatística & dados numéricos , Pneumonia/terapia , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Estudos Retrospectivos , Medição de Risco
6.
J Am Med Inform Assoc ; 21(2): 221-30, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24201027

RESUMO

OBJECTIVE: To summarize literature describing approaches aimed at automatically identifying patients with a common phenotype. MATERIALS AND METHODS: We performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included. RESULTS: Ninety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients. DISCUSSION: We observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems. CONCLUSIONS: There are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses.


Assuntos
Inteligência Artificial , Mineração de Dados/métodos , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Diagnóstico , Humanos , Fenótipo , Estatística como Assunto , Vocabulário Controlado
7.
Stud Health Technol Inform ; 192: 1177, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23920951

RESUMO

Genomic predictions of clinical outcome are a core promise of the Human Genome Project. Yet actionable biomarkers in clinical medicine are confounded by patient heterogeneity as patient phenotypes are rarely well characterized and often poorly understood. Furthermore, standard predictive algorithms rely on a priori knowledge of discrete phenotypes for feature selection and training. To address this limitation, we develop a classifier-free algorithm that matches individual patients to other patient outcomes based on optimized clinicopathologic feature integration and molecular pathway similarity using the K-nearest neighbor. By identifying the best matches within the collection of patient data, we are able to return the desired prediction. In prostate cancer, we demonstrate the algorithm's ability to predict cancer recurrence without the need for supervised learning techniques in independent datasets with a recall and precision of 78%. Importantly, the predictor is microarray platform independent, scalable and simple to implement. Taken together, this method provides an exciting foundation from data-driven, clinical decision-making may arise.


Assuntos
Inteligência Artificial , Predisposição Genética para Doença/epidemiologia , Predisposição Genética para Doença/genética , Recidiva Local de Neoplasia/epidemiologia , Recidiva Local de Neoplasia/genética , Neoplasias da Próstata/epidemiologia , Neoplasias da Próstata/genética , Algoritmos , Perfilação da Expressão Gênica/métodos , Marcadores Genéticos/genética , Humanos , Masculino , Reconhecimento Automatizado de Padrão/métodos , Polimorfismo de Nucleotídeo Único/genética , Prognóstico , Neoplasias da Próstata/diagnóstico , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Integração de Sistemas
8.
Stud Health Technol Inform ; 192: 1100, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23920874

RESUMO

We combined patient-level clinical data derived from the Electronic Health Record (EHR) with area-level environmental and socioeconomic data to study factors independently associated with overweight and obesity. Our multinomial logistic regression model showed that area-level factors such as farmers' markets, grocery stores and percent college-educated at the zip code level were significantly associated with the outcomes. However, mismatch in the granularity of community and clinical data limited us in creating a discriminatory model. While these results are promising, they reveal challenges that must be overcome in order to maximize secondary use of EHR data to further explore population health status.


Assuntos
Mineração de Dados/estatística & dados numéricos , Bases de Dados Factuais , Registros Eletrônicos de Saúde/estatística & dados numéricos , Registros de Saúde Pessoal , Registro Médico Coordenado/métodos , Obesidade/epidemiologia , Vigilância da População/métodos , Humanos , Obesidade/prevenção & controle , Ohio/epidemiologia , Prevalência , Integração de Sistemas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...